The analysis was performed on the dataset: Right Heart Catheterization (RHC) Dataset, first analysed Connors (et. al) (1996)
It focuses on the effect RHC has on the patients
Used propensity score matching to create an artificial control group
Their study found that patients undergoing RHC experienced shorter survival times.
Attribute datta includes patient demographics, socioeconomic details, physiological parameters, disease-related information, and survival outcomes.
We performed our analysis using \(\color{red}{\text{Tidyverse}}\).
Before cleaning and augmentation:
5735 patients
62 attributes
After cleaning and augmentation:
5612 patients
53 attributes
Familiarize ourselves with the data by extracting different information about the attributes and made numerous plots
Created summaries of different attributes, to find what makes sense to analyse
We used histograms because they are easy to read and interperate, while also showing a lot of information
rhc_aug |> mutate(sex = factor(sex),
swang1 = factor(swang1),
death = factor(x = death, levels = c(0,1), c("Alive","Dead"))) |>
table1(x = formula(~ sex + age + race + swang1 | death),
data = _)| Alive (N=1972) |
Dead (N=3640) |
Overall (N=5612) |
|
|---|---|---|---|
| sex | |||
| Female | 906 (45.9%) | 1594 (43.8%) | 2500 (44.5%) |
| Male | 1066 (54.1%) | 2046 (56.2%) | 3112 (55.5%) |
| age | |||
| Mean (SD) | 56.6 (17.4) | 64.0 (15.7) | 61.4 (16.7) |
| Median [Min, Max] | 58.0 [18.0, 102] | 66.0 [18.0, 101] | 64.0 [18.0, 102] |
| race | |||
| black | 323 (16.4%) | 577 (15.9%) | 900 (16.0%) |
| other | 121 (6.1%) | 223 (6.1%) | 344 (6.1%) |
| white | 1528 (77.5%) | 2840 (78.0%) | 4368 (77.8%) |
| swang1 | |||
| 0 | 1291 (65.5%) | 2177 (59.8%) | 3468 (61.8%) |
| 1 | 681 (34.5%) | 1463 (40.2%) | 2144 (38.2%) |
# A tibble: 10 × 4
Diagnosis Coefficient Intercept p.value
<chr> <dbl> <dbl> <dbl>
1 multiple diagnosis 0.0180 -0.163 0.359
2 seps 0.0292 -1.30 0.00222
3 card 0.0257 -0.754 0.0000221
4 resp 0.0240 -0.871 0.0000290
5 renal 0.00959 0.429 0.684
6 hema 0.0191 0.185 0.863
7 gastr 0.0272 -0.771 0.00937
8 trauma 0.0170 -1.56 0.147
9 neuro 0.0183 -0.223 0.358
10 meta 0.0156 -0.552 0.468
How come we found no major discoveries?
What could have been done differently?
We can conclude that PC can make sense for further analysis.
We can conclude that high values of APS for several diagnosis, will increase the risk of death
Sources: https://hbiostat.org/data/repo/rhc
download: https://hbiostat.org/data/repo/rhc.csv
R for Bio Data Science